Last two sessions (7 May, 14 May): Q&A session in Zoom instead of presentations/discussion in classroom.
Online examination mode
Part I: take-home exercises: No changes. To be handed out on 7 May, to be handed in on 8 June, 16:00.
Part II: project presentations: presentations recorded as ‘screencast’ (voice-over-slides).
Basically still the same requirements: use Rmd to create slides, presentations of 6-7 minutes max., etc. The only difference is how you deliver your presentation.
See previous examples for typical problems in a data analytics context.
Vast variety of potential bottlenecks. Hard to give general advice.
Programming with Big Data
Which basic (already implemented) R functions are more or less suitable as building blocks for the program?
How can we exploit/avoid some of R’s lower-level characteristics in order to implement efficient functions?
Is there a need to interface with a lower-level programming language in order to speed up the code? (advanced topic)
Independent of how we write a statistical procedure in R (or in any other language, for that matter), is there an alternative statistical procedure/algorithm that is faster but delivers approximately the same result.
Issues to keep in mind
Vectorization.
Memory: avoid copying, pre-allocate memory.
Use built in primitive (C) functions (caution: not always faster, if aim is precision).
Existing solutions: load additional packages (read.csv() vs. data.table::fread()).
Focus of what follows in this course (approach taken in Walkowiak (2016)).